Evaluation of binary classifiers について

;Informedness = Sensitivity + Specificity − 1
;Markedness = PPV + NPV − 1
;
''Sources: Fawcett (2006) and Powers (2011).''
|}
There are many metrics that can be used to measure the performance of a classifier or predictor; different fields have different preferences for specific metrics due to different goals. For example, in medicine sensitivity and specificity are often used, while in computer science precision and recall are preferred. An important distinction is between metrics that are independent on the prevalence (how often each category occurs in the population), and metrics that depend on the prevalence – both types are useful, but they have very different properties.
==Contingency table==
(詳細はgold standard test – and cross tabulates the data into a 2×2 contingency table, comparing the two classifications. One then evaluates the classifier ''relative'' to the gold standard by computing summary statistics of these 4 numbers. Generally these statistics will be scale invariant (scaling all the numbers by the same factor does not change the output), to make them independent of population size, which is achieved by using ratios of homogeneous functions, most simply homogeneous linear or homogeneous quadratic functions.
Say we test some people for the presence of a disease. Some of these people have the disease, and our test correctly says they are positive. They are called ''true positives'' (TP). Some have the disease, but the test incorrectly claims they don't. They are called ''false negatives'' (FN). Some don't have the disease, and the test says they don't – ''true negatives'' (TN). Finally, there might be healthy people who have a positive test result – ''false positives'' (FP). These can be arranged into a 2×2 contingency table (confusion matrix), conventionally with the test result on the vertical axis and the actual condition on the horizontal axis.
These numbers can then be totaled, yielding both a grand total and marginal totals. Totaling the entire table, the number of true positives, false negatives, true negatives, and false positives add up to 100% of the set. Totaling the rows (adding horizontally) the number of true positives and false positives add up to 100% of the test positives, and likewise for negatives. Totaling the columns (adding vertically), the number of true positives and false negatives add up to 100% of the condition positives (conversely for negatives). The basic marginal ratio statistics are obtained by dividing the 2×2=4 values in the table by the marginal totals (either rows or columns), yielding 2 auxiliary 2×2 tables, for a total of 8 ratios. These ratios come in 4 complementary pairs, each pair summing to 1, and so each of these derived 2×2 tables can be summarized as a pair of 2 numbers, together with their complements. Further statistics can be obtained by taking ratios of these ratios, ratios of ratios of ratios, or more complicated functions.
The contingency table and the most common derived ratios are summarized below; see sequel for details.
Note that the columns correspond to the condition actually being positive or negative (or classified as such by the gold standard), as indicated by the color-coding, and the associated statistics are prevalence-independent, while the rows correspond to the ''test'' being positive or negative, and the associated statistics are prevalence-dependent. There are analogous likelihood ratios for prediction values, but these are less commonly used, and not depicted above.

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「Evaluation of binary classifiers」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース